{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Accessing ICESat-2 Data\n",
    "\n",
    "This notebook covers ICESat-2 data query and access from the NASA NSIDC DAAC (NASA National Snow and Ice Data Center Distributed Active Archive Center). Specifically, it illustrates a recommended pathway to obtain data programmatically using the Python library `icepyx`.\n",
    "\n",
    "#### Credits\n",
    "* Notebook by: Jessica Scheick and Amy Steiker\n",
    "* Source material: [NSIDC DAAC data access with icepyx](https://github.com/icesat2py/icepyx/blob/master/doc/examples/ICESat-2_DAAC_DataAccess_Example.ipynb) by Jessica Scheick\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Objectives\n",
    "\n",
    "* Become familiar with the background, scope, and goals of the icepyx library.\n",
    "* Learn how to interact with icepyx to search, order, and download subsetted ICESat-2 data."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Motivation and Background\n",
    "\n",
    "On Friday, we explored various options for data access and visualization. Specifically, we spent some time using OpenAltimetry to interactively discover and plot data. But now we're ready to access the data directly and conduct our own analysis on it. This calls for a programmatic way to obtain our datasets and get them into our analysis environment. Although there are many ways to do this, here we are going to show examples using `icepyx`.\n",
    "\n",
    "`icepyx` is a wrapper on top of the NSIDC API, created specifically to make programmatic access to ICESat-2 data easier. The library is open-source, community developed and driven, and was created in response to user challenges faced at the 2019 ICESat-2 Hackweek. Under continuous development, we think you'll find `icepyx` a useful resource and encourage you to contribute your ideas, code, and feedback to enhance its use throughout the entire ICESat-2 workflow as depicted below. We hope that this year's Hackweek expands our ICESat-2 data-user community and sparks future feature development, both through hacking projects and beyond."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "img_path= '/home/jovyan/data-access/supporting_docs/IS2_data_access_options_d3.png'\n",
    "from IPython.display import Image\n",
    "Image(img_path) "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "## NSIDC DAAC Programmatic Data Access using icepyx\n",
    "\n",
    "### Import packages, including icepyx"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from icepyx import icesat2data as ipd\n",
    "import os\n",
    "import shutil\n",
    "from pprint import pprint\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Key Steps for Programmatic Data Access\n",
    "\n",
    "There are several key steps for accessing data from the NSIDC API:\n",
    "1. Define your parameters (spatial, temporal, dataset, etc.)\n",
    "2. Query the NSIDC API to find out more information about the dataset\n",
    "3. Log in to NASA Earthdata\n",
    "4. Define additional parameters (e.g. subsetting/customization options)\n",
    "5. Order your data\n",
    "6. Download your data\n",
    "\n",
    "We'll go through each of these steps during this tutorial, at the end summarizing how `icepyx` streamlines this process into a minimal number of lines of code."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "------------------\n",
    "------------------\n",
    "### Step 1: Create an ICESat-2 data object with the desired search parameters\n",
    "\n",
    "There are three required inputs:\n",
    "- `short_name` = the dataset of interest, known as its \"short name\".\n",
    "See https://nsidc.org/data/icesat-2/data-sets for a list of the available datasets. Dataset shortnames, or IDs, can also be found at the top of each individual dataset landing page on NSIDC's website.\n",
    "\n",
    "- `spatial extent` = a region of interest to search within. This can be entered as a bounding box, polygon vertex coordinate pairs, or a polygon geospatial file (currently shp, kml, and gpkg are supported, though gpkg is still untested).\n",
    "    - Bounding box: Given in decimal degrees for the lower left longitude, lower left latitude, upper right longitude, and upper right latitude. West longitudes and south latitudes should be provided as negative degrees.\n",
    "    - Polygon vertices: Given as longitude, latitude coordinate pairs of decimal degrees with the last entry a repeat of the first.\n",
    "    - Polygon file: A string containing the full file path and name.\n",
    "\n",
    "\n",
    "- `date_range` = the date range for which you would like to search for results. Must be formatted as a set of 'YYYY-MM-DD' strings separated by comma. \n",
    "\n",
    "There are also several optional inputs to allow the user finer control over their search.\n",
    "- `start_time` = start time to search for data on the start date. If no input is given, this defaults to 00:00:00.\n",
    "- `end_time` = end time for the end date of the temporal search parameter. If no input is given, this defaults to 23:59:59. Times must be input as 'HH:mm:ss' strings.\n",
    "- `version` = What version of the dataset to use, input as a numerical string. If no input is given, this value defaults to the most recent version of the dataset specified in `short_name`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "*IMPORTANT NOTES* \n",
    "\n",
    "- **Spatial extents:** The spatial extent is a required input for data access. Much like how your spatial extent in OpenAltimetry showed which tracks were available, this input provides an easy way for the NSIDC API to search through the ICESat-2 metadata and determine which granules *might* have some coverage there (without having to open all the individual granules, which would take a lot of computing resources). Then, this same spatial region will be used a second time to actually extract the data for the region you're interested in (which is why you might get a message during ordering saying a given granule didn't have any data in your region of interest). See Figure 1 in the [NSIDC Programmatic Access Guide](https://nsidc.org/support/how/how-do-i-programmatically-request-data-services) to view an example of the concept of spatial filtering (granules overlapping the inputted spatial extent based on the granule metadata), and spatial subsetting (data values within each granule are cropped, or extracted, based on the inputted spatial extent). \n",
    "\n",
    "- **Version warnings:** Version 001 is used as an example in the below cell to illustrate the warning issued by not using the most recent version. However, using it will cause 'no results' errors in granule ordering for some search parameters. These issues have been resolved in later versions of the datasets, so it is best to use the most recent version where possible. We will re-initialize `region_a` in a later cell to use for the rest of this notebook."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Below is an example using a spatial extent over Pine Island Glacier that was used as the same input for the land ice tutorial during this Hackweek."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Pine Island Glacier\n",
    "short_name = 'ATL06'\n",
    "spatial_extent = [-102, -76, -98, -74.5]\n",
    "date_range = ['2019-06-18','2019-06-25']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Create the data object using our inputs. Note the version warning if we specify an old version."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "region_a = ipd.Icesat2Data(short_name, spatial_extent, date_range, \\\n",
    "                           start_time='03:30:00', end_time='21:30:00', version='001')\n",
    "\n",
    "print(region_a.dataset)\n",
    "print(region_a.dates)\n",
    "print(region_a.start_time)\n",
    "print(region_a.end_time)\n",
    "print(region_a.dataset_version)\n",
    "print(region_a.spatial_extent)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now re-populate the data object. Note that with no version or time specified, the defaults are used."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "region_a = ipd.Icesat2Data(short_name, spatial_extent, date_range)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Formatted parameters and function calls allow us to see the the properties of the data object we have created."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(region_a.dataset)\n",
    "print(region_a.dates)\n",
    "print(region_a.start_time)\n",
    "print(region_a.end_time)\n",
    "print(region_a.dataset_version)\n",
    "print(region_a.spatial_extent)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "region_a.visualize_spatial_extent()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Alternatively, you could also just create the data object without creating named variables first:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# region_a = ipd.Icesat2Data('ATL06',[-102, -76, -98, -74.5],['2019-06-18', '2019-06-25'], \\\n",
    "#                            start_time='00:00:00', end_time='23:59:59', version='003')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Built in methods allow us to get more information about our dataset\n",
    "In addition to viewing the stored object information shown above (e.g. dataset, start and end date and time, version, etc.), we can also request summary information about the dataset itself or confirm that we have manually specified the latest version."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(region_a.latest_version())\n",
    "region_a.dataset_summary_info()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If the summary does not provide all of the information you are looking for, or you would like to see information for previous versions of the dataset, all available metadata for the collection dataset is available in a readable format."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "region_a.dataset_all_info()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "------------------\n",
    "------------------\n",
    "### Step 2: Querying a dataset\n",
    "In order to search the dataset collection for available data granules, we need to build our search parameters. This is done automatically behind the scenes when you run `region_a.avail_granules()`, but you can also build and view them by calling `region_a.CMRparams`. These are formatted as a dictionary of key:value pairs according to the CMR documentation.\n",
    "\n",
    "***What is CMR?***\n",
    "\n",
    "*The Common Metadata Repository (CMR) is a high-performance, high-quality, continuously evolving metadata system that catalogs Earth Science data and associated service metadata records. These metadata records are registered, modified, discovered, and accessed through programmatic interfaces leveraging standard protocols and APIs. Note that not all NSIDC data can be searched at the file level using CMR, particularly those outside of the NASA DAAC program.*\n",
    "\n",
    "*CMR API documentation: https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html*\n",
    "\n",
    "***What are granules?***\n",
    "\n",
    "*'Granules' are essentially data files with specific bounds. You can think of a granule as a snapshot in space and time, where each granule is stored in an individual file. In order to cover a large geographic area or have multiple snapshots in time, you would need to combine (mosaic) and/or compare multiple granules/files worth of data. Data storage and access using some of the other formats and tools described throughout this Hackweek (`Zarr`, `XArray`, `Dask`) are specifically designed to alleviate the problems introduced to cloud and parallel computing by virtue of the data being stored in granule/file-type pieces.*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#build and view the parameters that will be submitted in our query\n",
    "region_a.CMRparams"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now that our parameter dictionary is constructed, we can search the CMR database for the available granules. Granules returned by the CMR metadata search are automatically stored within the data object. Recall the search completed at this level relies completely on the granules' metadata. As a result, some (and in rare cases all) of the granules returned may not actually contain data in your specified region, particularly if the region is small or located near the boundaries of a given granule. If this is the case, the subsetter will not return any data when you actually place the order. A warning message will be issued during ordering for each granule to which this applies (but no message is output for successfully subsetted granules, so don't worry!)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#search for available granules and provide basic summary info about them\n",
    "region_a.avail_granules()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#get a list of the available granule IDs that meet your search criteria\n",
    "region_a.avail_granules(ids=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "#print detailed information about the returned search results\n",
    "region_a.granules.avail"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "------------------\n",
    "------------------\n",
    "### Step 3: Log in to NASA Earthdata\n",
    "\n",
    "In order to download any data from NSIDC, we must first authenticate ourselves using a valid Earthdata login. This will start an active logged-in session to enable data download. Once you have successfully logged in for a given icesat2data instance, the session will be passed behind the scenes as needed for you to order and download data. Passwords are entered but not shown or stored in plain text by the system."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Fill this in with your Earthdata Login user name and associated email\n",
    "\n",
    "earthdata_uid = ''\n",
    "email = ''\n",
    "region_a.earthdata_login(earthdata_uid, email)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# icepyx development login\n",
    "\n",
    "earthdata_uid = 'icepyx_devteam'\n",
    "email = 'icepyx.dev@gmail.com'\n",
    "region_a.earthdata_login(earthdata_uid, email)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "------------------\n",
    "------------------\n",
    "### Step 4: Additional Parameters and Subsetting\n",
    "\n",
    "Once we have generated our session, we must build the required configuration parameters needed to actually download data. These will tell the system how we want to download the data. As with the CMR search parameters, these will be built automatically when you run `region_a.order_granules()`, but you can also create and view them with `region_a.reqparams`. The default parameters, given below, should work for most users.\n",
    "- `page_size` = 10. This is the number of granules we will request per order.\n",
    "- `page_num` = 1. Determine the number of pages based on page size and the number of granules available. If no page_num is specified, this calculation is done automatically to set page_num, which then provides the number of individual orders we will request given the number of granules.\n",
    "- `request_mode` = 'async'\n",
    "- `agent` = 'NO'\n",
    "- `include_meta` = 'Y'\n",
    "\n",
    "#### More details about the **configuration parameters**\n",
    "`request_mode` is \"asynchronous\" by default, which allows concurrent requests to be queued and processed without the need for a continuous connection. In contrast, using a \"synchronous\" `request_mode` means that the request relies on a direct, continous connection between you and the API endpoint. Outputs are directly downloaded, or \"streamed\", to your working directory. For this tutorial, we will set the request mode to asynchronous.\n",
    "\n",
    "**Use the streaming `request_mode` with caution: While it can be beneficial to stream outputs directly to your local directory, note that timeout errors can result depending on the size of the request, and your request will not be queued in the system if NSIDC is experiencing high request volume. For best performance, NSIDC recommends setting `page_size=1` to download individual outputs, which will eliminate extra time needed to zip outputs and will ensure faster processing times per request.**\n",
    "\n",
    "\n",
    "Recall that we queried the total number and volume of granules prior to applying customization services. `page_size` and `page_num` can be used to adjust the number of granules per request up to a limit of 2000 granules for asynchronous, and 100 granules for synchronous (streaming). For now, let's select 9 granules to be processed in each zipped request. For ATL06, the granule size can exceed 100 MB so we want to choose a granule count that provides us with a reasonable zipped download size. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(region_a.reqparams)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### **Subsetting**\n",
    "\n",
    "Note: The rest of this section will provide a very cursory overview of subsetting (what it is, how to subset in icepyx, common questions/issues). For a deeper dive into subsetting, please see our [Subsetting Tutorial Notebook](https://github.com/ICESAT-2HackWeek/data-access/blob/master/notebooks/03-Data_Access2_Subsetting.ipynb), which covers subsetting in more detail, including how to get a list of subsetting options, how to build your list of subsetting parameters, and how to generate a list of desired variables (most datasets have more than 200 variable fields!), including using pre-built default lists (these lists are still in progress and we welcome contributions!)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "***What is SUBSETTING anyway?***\n",
    "\n",
    "*Anyone who's worked with geospatial data has probably encountered subsetting. Typically, we search for data wherever it is stored and download the chunks (aka granules, scenes, passes, swaths, etc.) that contain something we are interested in. Then, we have to extract from each chunk the pieces we actually want to analyze. Those pieces might be geospatial (i.e. an area of interest), temporal (i.e. certain months of a time series), and/or certain variables. This process of extracting the data we are going to use is called subsetting. See Figure 1 in the [NSIDC Programmatic Access Guide](https://nsidc.org/support/how/how-do-i-programmatically-request-data-services) for an illustration of this behavior.*\n",
    "\n",
    "*In the case of ICESat-2 data coming from the NSIDC DAAC, we can do this subsetting step on the data prior to download, reducing our number of data processing steps and resulting in smaller, faster downloads and storage.*"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In addition to the required parameters (CMRparams and reqparams) that are submitted with our order, for ICESat-2 datasets we can also submit subsetting and reformatting parameters to NSIDC. This utilizes the NSIDC's built in subsetter to extract only the data you are interested (spatially, temporally, variables of interest, etc.). The advantages of using the NSIDC's subsetter include:\n",
    "* easily reproducible downloads, particularly when coupled with an icepyx data object\n",
    "* smaller file size, meaning faster downloads, less storage required, and no need to subset the data on your own\n",
    "* still easy to go back and order more data/variables with the same or similar search parameters\n",
    "* no extraneous data means you can move directly to analysis and easily navigate your dataset\n",
    "\n",
    "\n",
    "Certain subset parameters are specified by default unless `subset=False` is included as an input to `order_granules()` or `download_granules()` (which calls `order_granules()` under the hood).\n",
    "\n",
    "As for the CMR and required parameters, default subset parameters can be built and viewed using `subsetparams`. Where an input spatial file is used, rather than a bounding box or manually entered polygon, the spatial file will be used for subsetting (unless subset is set to False) but not show up in the `subsetparams` dictionary.\n",
    "\n",
    "`icepyx` also makes it easy to take advantage of the reformatting (e.g. file format conversion) options offered by NSIDC. These are covered in more detail in the [Subsetting Tutorial Notebook](https://github.com/ICESAT-2HackWeek/data-access/blob/master/notebooks/03-Data_Access2_Subsetting.ipynb)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "region_a.subsetparams()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### Discover Subsetting Options\n",
    "\n",
    "You can see what subsetting options are available for a given dataset by calling `show_custom_options()`. The options are presented as a series of headings followed by available values in square brackets. Headings are:\n",
    "* **Subsetting Options**: whether or not temporal and spatial subsetting are available for the dataset\n",
    "* **Data File Formats (Reformatting Options)**: return the data in a format other than the native hdf5 (submitted as a key=value kwarg to `order_granules(format='NetCDF4-CF')`)\n",
    "* **Data File (Reformatting) Options Supporting Reprojection**: return the data in a reprojected reference frame. These options will be supported in the future for gridded ICESat-2 L3B datasets, once those are produced and made available.\n",
    "* **Data File (Reformatting) Options NOT Supporting Reprojection**: data file formats that cannot be delivered with reprojection\n",
    "* **Data Variables (also Subsettable)**: a dictionary of variable name keys and the paths to those variables available in the dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "region_a.show_custom_options(dictview=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### About Data Variables in an Icesat2Data object\n",
    "Each `icesat2data` object has an associated `order_vars` parameter, which is for interacting with variables during data querying, ordering, and downloading activities. `order_vars.wanted` holds the user's list to be submitted to the NSIDC subsetter and download a smaller, reproducible dataset\n",
    "\n",
    "Each variables parameter has methods to:\n",
    "* get variables available from the NSIDC (`avail()` method/attribute).\n",
    "* append new variables to the wanted list (`append()` method).\n",
    "* remove variables from the wanted list (`remove()` method)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "###### Determine what variables are available\n",
    "There are multiple ways to get a complete list of available variables.\n",
    "\n",
    "1. `region_a.order_vars.avail`, a list of all valid path+variable strings\n",
    "2. `region_a.show_custom_options(dictview=True)`, all available subsetting and data transformation options, as you ran above\n",
    "3. `region_a.order_vars.parse_var_list(region_a.order_vars.avail)`, a dictionary of variable:paths key:value pairs"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "By passing the boolean `options=True` to the `avail` method, you can obtain lists of unique possible variable inputs (var_list inputs) and path subdirectory inputs (keyword_list and beam_list inputs) for your dataset. These can be helpful for building your wanted variable list."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "region_a.order_vars.avail(options=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "###### Modifying your wanted variable list\n",
    "\n",
    "Generating and modifying your variable request list, which is stored in `region_a.order_vars.wanted`, is controlled by the `append` and `remove` functions that operate on `region_a.order_vars.wanted`. The input options to `append` are as follows (the full documentation for this function can be found by executing `help(region_a.order_vars.append)`).\n",
    "* `defaults` (default False) - include the default variable list for your dataset (not yet fully implemented for all datasets; please submit your default variable list for inclusion!)\n",
    "* `var_list` (default None) - list of variables (entered as strings)\n",
    "* `beam_list` (default None) - list of beams/profiles (entered as strings)\n",
    "* `keyword_list` (default None) - list of keywords (entered as strings); use `keyword_list=['']` to obtain a list of available keywords\n",
    "\n",
    "Similarly, the options for `remove` are:\n",
    "* `all` (default False) - reset `region_a.order_vars.wanted` to None\n",
    "* `var_list` (as above)\n",
    "* `beam_list` (as above)\n",
    "* `keyword_list` (as above)\n",
    "\n",
    "A few brief examples of this functionality are below. Please see the [Subsetting Tutorial Notebook](https://github.com/ICESAT-2HackWeek/data-access/blob/master/notebooks/03-Data_Access2_Subsetting.ipynb) for a more thorough set of examples for both beam and profile datasets."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Example 1: create a default variable list**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "region_a.order_vars.append(defaults=True)\n",
    "pprint(region_a.order_vars.wanted)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Example 2: clear the variable list**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "region_a.order_vars.remove(all=True)\n",
    "pprint(region_a.order_vars.wanted)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Example 3: choose variables**\n",
    "\n",
    "Add all `latitude` variables across all six beam groups. Note that the additional required variables for time and spacecraft orientation are included by default."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "region_a.order_vars.append(var_list=['latitude'])\n",
    "pprint(region_a.order_vars.wanted)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Example 4: add/remove selected beams+variables**\n",
    "\n",
    "Add `longitude` for beams `gt1l` and `gt3l` and remove `latitude` for beam `gt2l`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "region_a.order_vars.append(beam_list=['gt1l', 'gt3l'],var_list=['longitude'])\n",
    "region_a.order_vars.remove(beam_list=['gt2l'], var_list=['latitude'])\n",
    "pprint(region_a.order_vars.wanted)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Example 5: use default variables for order download**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "region_a.order_vars.remove(all=True)\n",
    "region_a.order_vars.append(defaults=True)\n",
    "pprint(region_a.order_vars.wanted)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### Applying variable subsetting to your order and download\n",
    "\n",
    "In order to have your wanted variable list included with your order, you must pass it as a keyword argument to the `subsetparams()` attribute (next cell block) or the `order_granules()` or `download_granules()` (Steps 4 and 5, below). "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "region_a.subsetparams(Coverage=region_a.order_vars.wanted)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "------------------\n",
    "------------------\n",
    "### Step 4: Place the Data Order\n",
    "We can send the order to NSIDC using the `order_granules()` function. Information about the granules ordered and their status will be printed automatically as well as emailed to the address provided. Email notifications can be set to false but they remain on as the default behavior. Additional information on the order, including request URLs (like the direct API request that is submitted), can be viewed by setting the optional keyword input 'verbose' to True. The verbose output can be helpful for troubleshooting (e.g. if you get errors from the subsetter) and relates to the output you'll see from other NSIDC resources."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Without variable subsetting, or with variable subsetting if you have run region_a.subsetparams(Coverage=region_a.order_vars.wanted):\n",
    "region_a.order_granules()\n",
    "\n",
    "# # With variable subsetting, if this is your first use of the Coverage keyword argument (kwarg):\n",
    "# region_a.order_granules(Coverage=region_a.order_vars.wanted)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# # Set email notifications to false. These are sent by default:\n",
    "# region_a.order_granules(email=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# # Additional information on the order including the API request itself using the verbose keyword:\n",
    "# region_a.order_granules(verbose=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# View a short list of order IDs:\n",
    "region_a.granules.orderIDs"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "------------------\n",
    "------------------\n",
    "### Step 5: Download the Order\n",
    "Finally, we can download our order to a specified directory (which needs to have a full path but doesn't have to point to an existing directory) and the download status will be printed as the program runs. Additional information is again available by using the optional boolean keyword `verbose`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "path = './download'\n",
    "\n",
    "# without variable subsetting, or with variable subsetting if you have run region_a.order_granules(Coverage=region_a.order_vars.wanted)\n",
    "region_a.download_granules(path)\n",
    "\n",
    "# with variable subsetting, if this is your first use of the Coverage keyword argument (kwarg)\n",
    "# region_a.download_granules(Coverage=region_a.order_vars.wanted)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If your data download has been interrupted for any reason (but your order was completed), you can restart the download and it will pick up where it left off. If you have been forced to restart your kernel, you will need to re-initialize the icesat2data object (`region_a=ipd.Icesat2Data(short_name, spatial_extent...)`) and log in to Earthdata before skipping to this step."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "region_a.download_granules(path,restart=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The downloaded data files will automatically be unzipped and moved into the single directory specified in `path`. We are currently working on adding a keyword option that gives users the ability to not unzip their downloaded files."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "------------------\n",
    "------------------\n",
    "### Quick-Start Quide\n",
    "The entire process of getting ICESat-2 data (from query to download) can ultimately be accomplished in three minimal lines of code:\n",
    "\n",
    "`region_a = ipd.Icesat2Data(short_name, spatial_extent, date_range)`\n",
    "\n",
    "`region_a.earthdata_login(earthdata_uid, email)`\n",
    "\n",
    "`region_a.download_granules(path)`\n",
    "\n",
    "where the function inputs are described throughout this tutorial.\n",
    "\n",
    "**This minimal example includes spatial and temporal subsetting (default) based on your input spatial and temporal extents. It does NOT include non-default subsetting and customization (e.g. variables, reformatting), which are covered in more detail in the [Subsetting Tutorial Notebook](https://github.com/ICESAT-2HackWeek/data-access/blob/master/notebooks/03-Data_Access2_Subsetting.ipynb).** The motivation for using some of the more detailed steps outlined in this notebook are that they provide the user with much more control over the data they download, ultimately saving time and effort later on in the processing and storage pipelines."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
